TD Algorithm for the Variance of Return and Mean-Variance Reinforcement Learning

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mean and variance responsive learning

Decision makers are often described as seeking higher expected payo¤s and avoiding higher variance in payo¤s. We provide some necessary and some su¢ cient conditions for learning rules, that assume the agent has little prior and feedback information about the environment, to re‡ect such preferences. We adopt the framework of Börgers, Morales and Sarin (2004, Econometrica) who provide similar re...

متن کامل

Asymptotic algorithm for computing the sample variance of interval data

The problem of the sample variance computation for epistemic inter-val-valued data is, in general, NP-hard. Therefore, known efficient algorithms for computing variance require strong restrictions on admissible intervals like the no-subset property or heavy limitations on the number of possible intersections between intervals. A new asymptotic algorithm for computing the upper bound of the samp...

متن کامل

Variance Reduction Methods for Sublinear Reinforcement Learning

This work considers the problem of provably optimal reinforcement learning for (episodic) finite horizon MDPs, i.e. how an agent learns to maximize his/her (long term) reward in an uncertain environment. The main contribution is in providing a novel algorithm — Variance-reduced Upper Confidence Q-learning (vUCQ) — which enjoys a regret bound of Õ( √ HSAT +HSA), where the T is the number of time...

متن کامل

The kNN-TD Reinforcement Learning Algorithm

A reinforcement learning algorithm called kNN-TD is introduced. This algorithm has been developed using the classical formulation of temporal difference methods and a k-nearest neighbors scheme as its expectations memory. By means of this kind of memory the algorithm is able to generalize properly over continuous state spaces and also take benefits from collective action selection and learning ...

متن کامل

The Tail Mean-Variance Model and Extended Efficient Frontier

In portfolio theory, it is well-known that the distributions of stock returns often have non-Gaussian characteristics. Therefore, we need non-symmetric distributions for modeling and accurate analysis of actuarial data. For this purpose and optimal portfolio selection, we use the Tail Mean-Variance (TMV) model, which focuses on the rare risks but high losses and usually happens in the tail of r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Japanese Society for Artificial Intelligence

سال: 2001

ISSN: 1346-0714,1346-8030

DOI: 10.1527/tjsai.16.353